-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE: track number of processed samples in each query #10232
Conversation
24164eb
to
344bfc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for being so quick on this. Looks good to me.
pkg/streamingpromql/operators/selectors/instant_vector_selector.go
Outdated
Show resolved
Hide resolved
// QueryStats tracks statistics about the execution of a single query. | ||
// | ||
// It is not safe to use this type from multiple goroutines simultaneously. | ||
type QueryStats struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?
It would also make it easier to implement TotalSamplesPerStep
if we want to support that too (which I think we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?
Given we don't support anything other than TotalSamples
in MQE, I wanted to make this clear in the code by using a struct that only had a field for TotalSamples
.
It would also make it easier to implement
TotalSamplesPerStep
if we want to support that too (which I think we do)
I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?
Given we don't support anything other than
TotalSamples
in MQE, I wanted to make this clear in the code by using a struct that only had a field forTotalSamples
.
I would be happy with a comment in query.go, but I'm not opposed to a separate struct.
It would also make it easier to implement
TotalSamplesPerStep
if we want to support that too (which I think we do)I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.
Fair enough, but also thinking of any future stats. I don't mind a separate struct though.
require.Equal(t, testCase.expectedTotalSamples, prometheusCount, "invalid test case: expected samples does not match value from Prometheus' engine") | ||
|
||
mimirCount := runQueryAndGetTotalSamples(t, mimirEngine, testCase.expr, testCase.isInstantQuery) | ||
require.Equal(t, testCase.expectedTotalSamples, mimirCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also compare the samples loaded as part of our test gauntlet if we expect it to be the same in all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently difficult due to the optimisation in prometheus/prometheus#14097, as Prometheus' engine sometimes skips loading data for histograms if it's not needed. MQE does not yet have the same optimisation, so there are some expected differences in the total sample count from the two engines in some cases.
Given the tests in TestQueryStats
, and the fact the statistics are informational and may differ between engines in the future due to other optimisations, I'm tempted to leave this as-is.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to leave it out if that's the case.
It might be interesting to see some much larger queries/time ranges etc to see if we are returning consistent results. We could perhaps use the same data generated from the benchmarks to create some large queries (of just floats since NH will be different). Then have a flag to RequireEqualResults
to compare them etc.
This isn't a blocker.
56ffb11
to
5dea7f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like how we have to iterate through the buffer vs counting the samples on filling the buffer, but otherwise lgtm. Is it possible to see a benchmark of range-vectors of histograms before approving?
Benchmarks show there is no statistically significant difference with this PR compared to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking 🚀
What this PR does
This PR adds support for tracking the number of samples processed in a query evaluated by MQE.
Which issue(s) this PR fixes or relates to
Resolves #10138
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.